We are IntechOpen, the world's leading publisher of Open Access books Built by scientists, for scientists

Open access books available 5,300

130,000 155M

International authors and editors

Downloads

Our authors are among the

most cited scientists 154 TOP 1%

Selection of our books indexed in the Book Citation Index in Web of Science™ Core Collection (BKCI)

# Interested in publishing with us? Contact book.department@intechopen.com

Numbers displayed above are based on latest data collected. For more information visit www.intechopen.com

## **MM-GB(PB)SA Calculations of Protein-Ligand Binding Free Energies**

Joseph M. Hayes <sup>1</sup> and Georgios Archontis 2 *1 Institute of Organic & Pharmaceutical Chemistry National Hellenic Research Foundation, Athens <sup>2</sup>Department of Physics, University of Cyprus, Nicosia <sup>1</sup>Greece <sup>2</sup>Cyprus* 

#### **1. Introduction**

The importance of computational chemistry in modern scientific research is well established. Continuous improvement in software and algorithms for the modeling of chemical interactions has transformed molecular modeling into a powerful tool for many current day research projects. From a medical perspective, one of the ultimate goals in computer-aided drug design (CADD) is the accurate prediction of ligand-binding affinities to a macromolecular target, which can facilitate and speed the routine identification of new candidates in early stage drug discovery projects (Gilson & Zhou, 2007; Hayes & Leonidas, 2010). In particular, structure-based modeling provides an efficient pathway towards exploiting known three-dimensional structural data in the design and proposal of new molecules for experimental evaluation. Docking calculations are now widely used in highthroughput virtual screening of structurally diverse molecules from available compound libraries/databases against specific targets. Once initial "hits" or "lead" molecules are identified (normally low µM inhibitors), modification of their chemical features in the "lead optimization" phase can improve their binding affinities and fine-tune other desirable druglike properties. However, docking calculations currently have limited success beyond the lead identification stage, where more accurate lower-throughput computational methods are needed. In this regard, the Molecular Mechanics/Generalized Born Surface Area (MM-GBSA) and Molecular Mechanics/Poisson-Boltzmann Surface Area (MM-PBSA) methods calculate binding free energies using molecular mechanics (forcefields) and continuum (implicit) solvation models (Kollman et al., 2000).They have been successfully applied across a range of targets and are implemented in software programs such as Amber (Case et al., 2005), Delphi (Rocchia et al., 2001) and Schrödinger (Du et al., 2011). With a target readership from beginner to expert, the current chapter provides an extensive and critical overview of MM-GB(PB)SA methods and their applications. The theoretical foundation of the MM-GB(PB)SA method is first described. We then discuss key aspects which improve the accuracy of results, and highlight potential caveats due to the approximations inherent in the methods. The chapter concludes with a review of recent representative applications, which illustrate both successes and limitations. The emphasis of this chapter is on structure-

based drug design (SBDD) efforts, however, the methods have widespread applicability in other areas such as in supramolecular chemistry.

#### **2. Theoretical background – "Pathway" and "Endpoint" methods**

Low-throughput computational approaches for the calculation of ligand binding free energies can be divided into "pathway" and "endpoint" methods (Deng & Roux, 2009; Gilson & Zhou, 2007). In pathway methods, the system is converted from one state (e.g., the complex) to the other (e.g., the unbound protein/ligand). This can be achieved by introducing a set of finite or infinitesimal "alchemical" changes to the energy function (the Hamiltonian) of the system through free-energy perturbation (FEP) or thermodynamic integration (TI), respectively (Kollman, 1993; Straatsma & McCammon, 1992). The fundamentals of FEP and TI methods were introduced many decades ago by John Kirkwood (Kirkwood, 1935) and Robert Zwanzig (Zwanzig, 1954). In recent years, their use in the computation of absolute binding affinities has become feasible due to increases in computational power, the development of more accurate models of atomic interactions (Cornell et al., 1995; MacKerell et al., 1998; van Gunsteren et al., 1996), the clarification of the underlying theoretical framework and the introduction of methodological advances (Boresch et al., 2003; Bowers et al., 2006; Deng and Roux, 2009; Gilson et al., 1997; Gilson & Zhou, 2007; Lee & Olson, 2006; Mobley et al., 2007). Combined with atomistic molecular dynamics (MD) or Monte Carlo (MC) simulations in explicit water solvent models, they are arguably the most accurate methods for calculating absolute or relative ligand binding affinities.

The "alchemical" computation of *differences* in binding affinities (rather than *absolute*  affinities) among a set of related ligands for the same target protein is more accurate and technically simpler. A thermodynamic cycle illustrating the basic principles is shown in Figure 1 (Tembe & McCammon, 1984). The horizontal legs describe the experimentally accessible actual binding processes, with free energies Gbind(L1) and Gbind(L2). Since the free energy is a state function, the relative binding free energy Gbind is exactly equal to the difference of the free energies in the horizontal or vertical legs:

$$
\Delta\Delta G\_{\text{bind}} = \Delta G\_{\text{bind}} \text{(L.2)} - \Delta G\_{\text{bind}} \text{(L1)} \tag{1}
$$

$$\begin{array}{ccccc} \stackrel{\frown}{\longrightarrow} \Delta \stackrel{\frown}{\mathrm{complex}} & \stackrel{\frown}{\longrightarrow} \hat{\mathrm{L2}} \end{array} \begin{array}{ccccc} \stackrel{\frown}{\mathrm{L2}} & \stackrel{\frown}{\longrightarrow} \hat{\mathrm{L2}} \end{array} \begin{array}{ccccc} \stackrel{\frown}{\longrightarrow} \hat{\mathrm{L2}} \end{array} \begin{array}{ccccc} \stackrel{\frown}{\longrightarrow} \cdots \end{array} \begin{array}{ccccc} \stackrel{\frown}{\longrightarrow} \cdots \end{array} \begin{array}{ccccc} \mathfrak{2} \end{array}$$

The simulations follow the vertical steps (Eq. (2)) or unphysical processes, by simulations in water solution that gradually change the energy-function of the system from one "endpoint" to the other through a series of intermediate hybrid states.From Figure 1, this involves the stepwise "alchemical" transformation of ligand L1 to L2 both in its 'free' state (unbound) and in the bound complex, through gradual changes in the forcefield parameters describing the ligand interactions. This leads to the free energy changes Gfree(L1→L2) and Gcomplex(L1→L2), respectively. Averaging over both transformation directions is often used to improve the free-energy estimates, although this is not always the case (Lu & Woolf, 2007). These calculations can be accurate, if conducted with the appropriate care. An overview of current state-of-the-art methods for absolute and relative affinity calculations is in (Chodera et al., 2011).

Fig. 1. Thermodynamic cycle linking the binding of two ligands L1 and L2 to a protein in solution.

To conclude, the emerging implementation of biomolecular codes on GPU architectures (Harvey et al., 2009; Stone et al., 2011) and the development of simple free-energy protocols (Boresch & Bruckner, 2011) make atomistic methods of absolute or relative affinities very promising for larger-scale calculations in the near future. Nevertheless, at present they are still relatively time-consuming, and require considerable expertise and planning. They preclude the consideration of more than a few complexes per day on a dedicated CPU cluster with a few tens of nodes. A trade-off between computational expense and accuracy is therefore required when the goal is to investigate and compare the binding strengths of a structurally diverse and/or larger set of ligands via MD simulations. For this purpose, much less computationally demanding "endpoint" methods are often successfully applied, such as the "linear interaction energy" (LIE) (Åqvist et al., 2002) or the molecular mechanics – Poisson Boltzmann (MM-PBSA) (Kollman et al., 2000) and the related molecular mechanics – generalised Born (MM-GBSA) approximation (Gohlke et al., 2003). All these methods compute binding free energies along the horizontal legs of Figure 1, but use only models for the "endpoints" (bound and unbound states). The MM-PB(GB)SA methodology is now described in more detail.

#### **3. The MM-GB(PB)SA methodology**

Using MM-GBSA and MM-PBSA methods, relative binding affinities for a set of ligands to a given target can often be reproduced with good accuracy and considerable less computational effort compared to full-scale molecular dynamics FEP/TI simulations. Furthermore, free-energies can be decomposed into insightful interaction and desolvation components (Archontis et al., 2001; Hayes et al., 2011; Polydoridis et al., 2007).

#### **3.1 Thermodynamics & calculation framework**

In the MM-GB(PB)SA formulation, the binding free energy of a ligand (L) to a protein (P) to form the complex (PL) is obtained as the difference (Pearlman, 2005):

$$
\Delta G\_{\text{bind}} = \text{G(PL)} - \text{G(P)} - \text{G(L)}\tag{3}
$$

The free energy of each of the three molecular systems P, L, and PL is given by the expression:

$$\mathbf{G(X)} = E\_{\text{MM}}(X) + \mathbf{G}\_{\text{solv}}(X) - TS(X) \tag{4}$$

In Eq. (4), EMM is the total molecular mechanics energy of molecular system X in the gas phase, Gsolv is a correction term (solvation free energy) accounting for the fact that X is surrounded by solvent, and S is the entropy of X.

To apply the MM-GB(PB)SA formulation, a representative set of equilibrium conformations for the complex, free protein and free ligand are first obtained by atomistic MD simulations in explicit solvent. In this post-processing phase, the solvent is discarded and replaced by a dielectric continuum. Changes (∆) in the individual terms (∆EMM, ∆Gsolv, -T∆S) of Eq.(4) between the unbound states and the bound (complex) state are calculated, and contribute to the binding free energies according to Eq.(3). Computation of each of the terms in Eq. (4) is now described in more detail.

EMM is the sum of the bonded (internal), and non-bonded electrostatic and van der Waals energies

$$E\_{\rm MM} = E\_{\rm bonded} + E\_{\rm elec} + E\_{\rm vw} \tag{5}$$

These energy contributions are computed from the atomic coordinates of the protein, ligand and complex using the (gas phase) molecular mechanics energy function (or forcefield). The solvation free energy term Gsolv contains both polar and non-polar contributions. The polar contributions are accounted for by the generalized Born, Poisson, or Poisson-Boltzmann model, and the non-polar are assumed proportional to the solvent-accessible surface area (SASA), *c.f.* Section 3.2:

$$\mathbf{G}\_{\text{solv}} = \mathbf{G}\_{\text{PB(GB)}} + \mathbf{G}\_{\text{SASA}} \tag{6}$$

Finally, the entropy S is decomposed into translational, rotational and vibrational contributions. The first two are computed by standard statistical-mechanical expressions, and the last is typically estimated from a normal-mode (harmonic or quasiharmonic) analysis (Brooks et al., 1995; Karplus & Kushick 1981; Tidor & Karplus 1994). In practice, current software implementations normally determine all three contributions to S as part of a normal-mode analysis.

To improve the accuracy of the computed binding free energies, the various terms of Eq. (4) are averaged over multiple conformations or MD snapshots (typically a few hundred for the EMM and Gsolv contributions). Depending on the extent of conformational fluctuations in the system under consideration, the convergence into stable values may require relatively long (multi-ns) simulations. The computation of the entropy term, however, requires the extensive minimization of the trajectory conformations for the protein, ligand and complex to local minima on the potential energy surfaces, followed then by normal mode analysis. This procedure is costly and prevents the consideration of a large number of conformations; insufficient sampling can therefore sometimes an issue. To decrease the computational cost, the protein can be truncated beyond a certain cutoff distance and the system minimized using a distance-dependent dielectric, which simulates the deleted surroundings (Kongsted & Ryde, 2009). However, a large variation of the entropy term often results from these 'free' minimizations. Including a fixed buffer region (with water molecules) beyond the cut-off can lead to more stable entropy predictions (Kongsted & Ryde, 2009).

The internal energy terms (Ebonded) of the protein and complex can be on the order of a few thousand kcal/mol, and can introduce large uncertainties in the computed binding free

energies. This is prevented in the "single-trajectory" approximation (Gohlke & Case, 2004; Page & Bates, 2006), which employs simulations of a single state (the complex) to generate conformations for all three states (complex, protein and ligand). For each MD conformation sampled, the resulting internal energy terms of the protein and ligand are identical in the bound and the unbound states and cancel exactly in Eq. (3). Hence, effectively only the protein-ligand (non-bonded) interaction energies of the EMM term in Eq. (5) contribute to ∆Gbind. 'Single-trajectory' simulations significantly reduce computational effort and are generally sufficiently accurate for most applications. The downside of the approximation is that any explicit structural relaxation of the protein and ligand upon binding is ignored. Although charge reorganization can be partly taken into account implicitly by setting the protein/ligand (internal) dielectric constants (Section 3.2) to values larger than εin = 1-2 (Archontis et al., 2001; Archontis & Simonson, 2005; Schutz & Warshel, 2001; Simonson, 2003), the neglect of explicit structural relaxation may introduce errors depending on the system (Tamamis et al*.*, 2010). Separate MD simulations for the complex, and unbound receptor and ligands may also be performed (the "three-trajectory" approximation) but require greater computational effort, although in theory should yield more accurate results. Indeed, Yang and co-workers have recently shown that including separate simulations for the ligand and accounting for the "ligand reorganization" free energy led to significant improvements in binding affinity predictions for a set of ligands targeting XIAP (Yang et al., 2009). In certain cases, therefore, the added expense of separate simulations may be justified.

#### **3.2 Solvation models – GBSA and PBSA**

Proteins function usually inside aqueous solutions or in membrane environments, which are in the vicinity of an aqueous medium. The surrounding solvent can influence protein stability and function, ligand binding and protein-protein association. Since the solvent modifies in a non-trivial manner the intramolecular and intermolecular interactions, an accurate inclusion of solvent effects in biomolecular modeling and simulation is a challenging task.

Currently, the most accurate treatment in molecular simulations is achieved by atomicdetail models that represent explicitly the biomolecule and its surrounding environment. Several water models are used successfully to describe the aqueous environment in atomistic simulations; examples include SPC (Berendsen et al., 1981), SPC/E (Berendsen et al., 1987), TIP3P and TIP4P (Jorgensen et al., 1983), and TIP5P (Mahoney & Jorgensen, 2000). In practice, the explicit inclusion of water leads to a considerable increase in both the size of the simulation system and the computational cost of the simulation itself. Furthermore, the computation of solvation or binding free energies requires an exhaustive sampling of the solvent degrees of freedom. A much less costly approach is to represent the solvent implicitly in the simulation, through the incorporation of additional "potential of mean force" terms (Roux, 2001; Roux & Simonson, 1999) in the gas-phase energy function (e.g., Eq. (7) below). These terms depend only on the atomic coordinates of the solute, and express the solute free energy for a given configuration, after the solvent degrees of freedom have been "integrated out" (Roux, 2001; Roux & Simonson, 1999). Thus, the simulation system has the same number of degrees of freedom as in the gas phase and there is no need for explicit sampling over solvent degrees of freedom. The MM-PB(GB)SA method considered here, combine atomistic simulations in explicit solvent for the generation of representative

biomolecular conformations with an implicit-solvent estimation of the binding free energies, in a post-processing step.

Conceptually, most implicit solvent models decompose the solvation process into three sequential steps (Cramer & Truhlar, 1999): i) creation of a cavity in solution to accommodate the biomolecule; ii) switching-on dispersion interactions between the biomolecule and surrounding medium, while all atomic charges are set to zero; and iii) switching-on the biomolecular charges. The solvation free energies of steps i) and ii) are normally assumed to be proportional to the SASA of the biomolecule and represent the *non-polar* contributions (GSASA) to Gsolv in Eq. (6), although the validity of this approximation has been questioned for step ii) (Levy et al., 2003). With a positive coefficient of proportionality, an increase in the SASA is associated with an unfavorable increase in solvation free energy, which is partly accounted for by the tendency of non-polar residues to be solvent-excluded. The equation typically used is of the form:

$$\mathbb{C}\_{SASA}(\mathbf{X}) = \gamma \text{SASA} + \beta \tag{7}$$

with the ┛ and ┚ parameter values dependant on the method and solvation model (PBSA or GBSA) used (Rastelli et al., 2010).

Meanwhile, step iii) calculates the contribution to solvation free energy due to the charge /electrostatic interactions of the solute with the surrounding solvent, the *polar* contributions (GPB(GB)) to Gsolv in Eq. (6). In continuum-electrostatics models such as PB and GB, the solute is treated as a low-dielectric cavity embedded in a high dielectric medium. The solute charges are in the simplest and most common approximation centered on the individual atoms. The resulting solvation free energy of a molecule X is expressed as (Simonson, 2003):

$$\text{G}\_{\text{PB(GB)}}(X) = \frac{1}{2} \sum\_{i,j \in X} q\_i q\_j g\_{ij} \,^{PB(GB)} \tag{8}$$

where the summation is over all the atomic charges {qi}. The quantity gij PB(GB) is determined using the PB model by numerical solution of the Poisson or Poisson-Boltzmann equation (depending on the existence of salt), or using the GB model by an analytical expression with the functional form (Simonson, 2003; Still et al., 1990):

1/ 1 1 exp *n n GB n ij ij ij ij ij r g r B AB* (9)

The parameters Bij depend on the position (distance from the solute-solvent dielectric boundary) of atoms *i* and *j*, and the shape of the entire biomolecule; ε is the solvent dielectric constant, and rij is the distance between *i* and *j*. The constants n and A were set to n=2 and A=4 in the original formulation of Still and coworkers (Still et al., 1990).

In the PB model, the solute dielectric constant (εin) affects the computed functions gij PB and Eq. (8). Meanwhile, in the GB model, the solute dielectric constant drops out from the final expression in Eq.(9), due to the approximations used to arrive at an analytic formula. An εin value other than 1 can still be used; in this case, the first parenthesis on the right-hand side of Eq. (9) becomes (1/ε –1/εin) and the GB expression yields the free energy of transferring

the solute from an infinite reference medium with dielectric constant εin into solution. A careful discussion of this point is in (Bashford & Case, 2000).

Application of Eq. (8) to a protein:ligand complex (PL) and the dissociated protein (P) and ligand (L) yields the electrostatic (polar) solvation free energy contribution to Eq. (3):

$$
\Delta G\_{\rm PB(GB)} = G\_{\rm PB(GB)}(PL) - G\_{\rm PB(GB)}(P) - G\_{\rm PB(GB)}(L) \tag{10}
$$

An advantage of these methods is that they facilitate the decomposition of the total solvation free energy into insightful components. Indeed, the summation over atomic charges in Eq. (8) implies that the electrostatic free energy of Eq. (10) can be partitioned into interaction and desolvation components (Archontis et al., 2001; Hendsch & Tidor, 1999):

$$\Delta \mathbf{G}\_{\text{PB(GB)}} = \sum\_{i \in P, j \in L} q\_i g\_{ij}^{PL} q\_j + \left\lfloor \frac{1}{2} \sum\_{i, j \in P} q\_i g\_{ij}^{PL} q\_j - \frac{1}{2} \sum\_{i, j \in P} q\_i g\_{ij}^{P} q\_j \right\rfloor + \left\lfloor \frac{1}{2} \sum\_{i, j \in L} q\_i g\_{ij}^{PL} q\_j - \frac{1}{2} \sum\_{i, j \in P} q\_i g\_{ij}^{L} q\_j \right\rfloor \tag{11}$$

The notation gij <sup>X</sup> implies that the interaction between charges *i* and *j*, as determined by the function gij, is in general different in the complex, free protein and free ligand. The first term on the right-hand side of Eq. (11) is the "interaction term" and arises from direct interactions between the protein and ligand charges, which are only present in the complex; the next term [the first brace] is a protein "desolvation" term, arising from the replacement of highdielectric solvent by the low-dielectric ligand in the protein vicinity, as well as structural relaxation and changes in the charge distribution of the protein. The last term [second brace] corresponds to the desolvation of the ligand.

The "interaction term" can be further decomposed into contributions from specific residues (Archontis et al., 2001; Hendsch & Tidor 1999). For example, the contribution of a protein residue R to this term is given by the following expression

$$
\Delta G\_{\text{PB(GB)}}^{R} = \sum\_{i \in R, j \in L} q\_i g\_{ij}^{PL} q\_j \tag{12}
$$

These components provide useful insights on the origin of the binding free energy values. They can help interpret differences in binding affinities for a series of related complexes and guide the design of modified proteins or ligands. For example, in a study of amino acid binding to native and mutant aspartyl-tRNA synthetase, residue decomposition identified the protein residues discriminating between the cognate ligand (aspartic acid) and the analogue asparagine (Archontis et al., 2001). In another study of RNAse A recognition by dinucleotidic inhibitors, a similar decomposition attributed the stronger binding of the most potent inhibitor to interactions with two active site lysines (Polydoridis et al., 2007).

Finally, Hou and co-workers very recently evaluated the performance of MM-GBSA and MM-PBSA for predicting binding free energies based on molecular dynamics simulations (Hou et al., 2011a). Their results showed that MM-PBSA performed better in calculating absolute binding free energies compared to MM-GBSA but not necessarily for the relative binding free energies, sufficient for most applications in computational drug design. Interestingly, in a study of the accuracy of continuum solvation models for drug-like molecules, GB methods typically were more stable and gave more accurate results that the widely used PB methods (Kongsted et al., 2009).

#### **3.3 Selection of MD trajectory conformations/snapshots**

A recent study for the binding of seven biotin analogues to avidin suggested that to obtain statistically converged MM-GBSA results, several independent simulations each with sampling times of 20-200 ps (averaging the results) is more effective than a single long simulation (Genheden & Ryde, 2010). 'Single-trajectory' simulations of the complex are generally sufficiently accurate for most applications, and while MD simulation length does have an obvious impact on the accuracy of predictions, longer MD simulations doesn't necessarily mean better predictions (Hou et al., 2011a). For the calculations of the ∆EMM and ∆Gsolv terms, a large ensemble (e.g. several hundred) conformations are typically extracted in small intervals from the single MD trajectory of the complex. Alternatively, averaging over a select few receptor-ligand binding conformations from the MD trajectory via clustering has proved effective (Section 4.1), as well as more time efficient (Hayes et al., 2011). MM-GB(PS)SA calculations on single (minimized) structures has also recently been proposed and validated (Kuhn et al., 2005; Rastelli et al., 2010), but not necessarily for structures generated from MD simulations (Section 4.2). Meanwhile, for the entropy term calculated using normal mode analysis, fewer snapshots (typically less than a 100) are employed, due to the computational cost involved. As already highlighted (Section 3.1), a larger number of snapshots may be required for more stable and accurate predictions. These calculations, however, are computationally expensive and often not feasible with limited computational resources. Consequently, neglect of the entropy term can in some cases lead to sufficient or more accurate predictions for ranking of ligand binding affinities in certain macromolecular systems (Hayes et al., 2011; Hou et al., 2011a; Rastelli et al., 2010).

#### **3.4 Limitations & caveats of MM-GB(PB)SA calculations**

MM-GB(PB)SA methods are widely recognised as valuable tools in CADD applications. However, as with any method they have limitations and caveats, which need to be considered. First, while useful for ranking relative ligand binding affinities, these methods lack the required accuracy for absolute binding free energy predictions (Hou et al., 2011a; Singh & Warshel, 2010).The inclusion of entropic contributions brings the MM-GB(PB)SA values somewhat closer to experimental absolute affinities (Gilson & Zhou, 2007). However, such entropic terms are costly and contain large uncertainties. Force-field inconsistencies may also be an issue: PB and GB results depend strongly on adequate atomic charges and van der Waals radii, which are often optimized for MD simulations. The MM-GB(PB)SA results may be influenced by system-dependent properties, such as the features of the binding site, the extent of protein and ligand conformational relaxation upon association, and the protein and ligand charge distribution (Kuhn et al., 2005; Hou et al., 2011a). Continuum electrostatics models ignore the molecular structure of the solvent; in some cases this might affect the results, particularly when key receptor-ligand interactions are bridged by water molecules, *c.f.* Section 4.1 (Hayes et al., 2011). Furthermore, the value of the protein/ligand dielectric constant is empirically chosen, and takes into account not only the protein and ligand structural relaxation, but also other error-introducing factors such as the ones mentioned above (Archontis & Simonson, 2005; Schutz & Warshel, 2001). Hou and coworkers suggested in a recent MM-PBSA study that the use of εin = 4 for a highly charged protein-ligand binding interface, εin = 2 for a moderately charged binding interface and εin = 1 for a hydrophobic binding interface may improve ligand ranking (Hou et al., 2011a). The lack of a consistent optimum dielectric constant for MM-PBSA calculations has been noted by other workers (see for e.g. (Aleksandrov et al*.* 2010)), although generally a value εin =4 often gives satisfactory results (Aleksandrov et al., 2010; Archontis et al., 2001; Thompson et al., 2006). As a final note, MM-GB(PB)SA calculations require some degree of user expertise and planning, from the initial set-up and analysis of the MD simulations through to the binding free energy calculations themselves.

#### **3.5 Variations & extensions of MM-GB(PB)SA**

While molecular docking algorithms are computationally efficient methods used to screen a large number of ligands against a given target in reasonable time, generating and postprocessing MD ensembles for more than a few receptor-ligand structures in MM-GB(PB)SA calculations is currently impractical. Recently, however, MM-GB(PB)SA methods are receiving plaudits as post-docking methods in virtual screening experiments. Postprocessing of single docking poses using MM-GB(PS)SA algorithms can improve correlations between predicted and experimental binding affinities with a number of successes reported, *c.f.* for example (Du et al., 2011; Hou et al., 2011b; Lyne et al.; 2006). This is consistent with the previously mentioned discovery that select single receptor-ligand structures can prove as accurate as sampling over large numbers of MD trajectory snapshots in MM-GB(PB)SA applications (Kuhn et al., 2005; Rastelli et al., 2010). Post-docking MM-GBSA is implemented in Schrödinger software using the program Prime, with options to include receptor and ligand flexibility; the entropy term is neglected by default. Other recent extensions of MM-PBSA exploit quantum mechanics (QM) methods in QM/MM-PBSA calculations (Gräter et al., 2005; Manta et al., 2012; Wang & Wong, 2007). Here, a "hybrid" gas phase energy term (EQM/MM) effectively replaces the pure molecular mechanics energy (EMM) term in Eq. (5). Representing the ligand by QM has the advantage, for example, to eliminate a frequently encountered problem of deficient ligand forcefield parameters. However, calculations of this sort are significantly more expensive than MM-GB(PB)SA, and therefore are typically only viable for binding predictions on relatively few ligands.

### **4. Recent applications of MM-GB(PS)SA**

In the present section, we will review recent examples of the use of MM-GB(PS)SA calculations for calculating ligand binding free energies highlighting both successes and limitations.

#### **4.1 Example 1: Phosphorylase kinase ATP-binding site inhibitors**

With an aim towards glycogenolysis control in type 2 diabetes, indirubin (IC50> 50 µM), indirubin-3'-oxime (IC50 = 144 nM), KT5720 (Ki = 18.4 nM) and staurosporine (Ki = 0.37 nM) were investigated as phosphorylase kinase (PhK┛trnc) ATP-binding site inhibitors (Hayes et al., 2011).

Due to the lack of experimental structural information for binding of these ligands (Figure 2), MD simulations in explicit solvent using Desmond 2.0 (Bowers et al., 2006) were performed for each receptor-inhibitor complex, with 4 ns production runs following an initial equilibration period. A computationally-efficient multiple timestep RESPA integration algorithm was employed with timesteps of 2, 2 and 6 fs for bonded, "near" and "far" non-bonded interactions, respectively. Energy and trajectory atomic coordinate data

Fig. 2. The ATP-binding site inhibitors of phosphorylase kinase, as studied in Ref. (Hayes et al., 2011).

were recorded every 1.2 and 2.1 ps, respectively. For the trajectory analysis preceding the 'single-trajectory' based MM-GBSA calculations, both VMD (Humphrey et al., 1996) and Desmonds' Maestro simulation analysis tools were employed. The extracted MD trajectory binding site conformations (inhibitors + residues within 7 Å) of each complex were clustered into 10 groups based on atomic root-mean-square-distances (RMSDs). Every second frame (snapshot) of the last 3 ns of the production run (analysis phase) was used in the hierarchial clustering algorithm employed by the Desmond Maestro's Trajectory Clustering module. The representative complex of each of the 10 binding site cluster families (waters and counterions deleted) was then used in MM-GBSA calculations of binding free energies using Eq. (3). Schrödinger software with the MacroModel 9.7 Embrace module was used for the *∆EMM* and *∆Gsolv* calculations, while the entropy change, *∆S*, was calculated for the minimized representatives using Rigid Rotor Harmonic Oscillator (RRHO) calculations. Using this algorithm, the change in vibrational, rotational and translational (VRT) entropy of the ligands on binding was estimated. Finally, the thermodynamic average *∆Gbind* values (300 K) were estimated using the corresponding values for the 10 cluster representatives:

$$
\Delta \mathbf{G}\_{bind} = \sum\_{i=1}^{10} p\_i \Delta \mathbf{G}\_{bind}(i) \tag{13}
$$

The sum *i* was over the 10 cluster representatives, with *pi* defined as the cluster frequency:

$$p\_i = \frac{N\_i}{N\_{\text{total}}} \tag{14}$$

where *Ni* the number of frames in cluster *i*, and *Ntotal* the total number of frames.

Fig. 3. Predicted binding of staurosporine (gray) at the ATP-binding site of PhK┛trnc. Key interacting residues are colored by type – polar/charged residues shown in red (D104, E110 and E153) and hydrophobic in green (M106). Hydrogen bonds are formed with M106 and D104 backbones (hinge region), E110 and E153.

The relative binding affinities from the final *∆Gbind* values were generally in agreement with experiment, except the rankings of KT5720 and staurosporine (representative complex shown in Figure 3) were reversed by 0.7 kcal/mol. The discrepancy was accounted for by neglect of certain key contributions to the binding free energies using the MM-GBSA algorithm. Notably, accounting for and estimating the loss of the conformational entropy for the more flexible KT5720 ligand yielded *∆Gbind* values 1.2 – 2.6 kcal/mol in favour of staurosporine binding and hence in agreement with the experimental rankings. Further, whereas staurosporine had no key receptor-ligand bridging waters, the MD simulations revealed a key role of bridging waters for KT5720 binding. The entropy loss associated with a bound water molecule in a protein-ligand complex is sometimes important (but typically neglected in MM-GB(PB)SA calculations), with an upper bound free energy cost of 2 kcal/mol at 300 K suggested (Dunitz, 1994).

#### **4.2 Example 2: Interpretation of species specificity of compstatin, a peptidic inhibitor of the complement system**

The complement system provides the first line of defense against the invasion of foreign pathogens (Mastellos et al., 2003). Its inappropriate activation may cause or aggravate several pathological conditions, including asthma, macular degeneration, rheumatoid arthritis, and rejection of xenotransplantation. The 13-residue compstatin is a promising candidate for the therapeutic treatment of unregulated complement activation (Janssen et

al., 2007; Morikis & Lambris, 2005). The conformation of the human C3 – compstatin complex is shown in Figure 4. Compstatin interacts closely with four protein sectors indicated by thick yellow tubes. An important conclusion, drawn from experimental studies with a large number of species, was that compstatin inhibits the key complement component protein C3 from several primate species, but is inactive against the corresponding protein from lower mammals, precluding the testing of compstatin-based analogues in animal models (Sahu et al., 2003). To understand this species specificity, Tamamis and co-workers compared the stabilities of compstatin complexes with human or rat C3 (Tamamis et al., 2010) by atomistic MD simulations and an MM-GBSA analysis. In the simulations of the rat C3 complex, specific protein sectors near the compstatin binding site underwent reproducible localized displacements, which eliminated or weakened critical protein-ligand interactions. In agreement with the simulations and the lack of compstatin activity against rat C3, a MM-GBSA analysis estimated the binding free energy of the human C3 complex to be stronger by -9 kcal/mol relative to the rat C3 complex, in the "singletrajectory" approximation. If protein and ligand relaxation were taken into account by a "three-trajectory" approximation, the relative binding free energy increased (in absolute value) to -19 kcal/mol. Thus, in this system the neglect of relaxation effects introduced a significant error, even though the qualitative conclusion was correct in both cases.

Fig. 4. Representation of the human C3c-compstatin complex. The compstatin main chain is shown in a red tube and licorice representation. The protein C3c is shown as a gray tube; only residues 329-534 and 607-620 are included, corresponding to the simulation system of Ref. (Tamamis et al, 2010). The yellow thick tubes show four protein sectors in proximity to compstatin (488-492, 454-462, 344-349 and 388-393, from left to right). The right-most sector 388-393 moves away from compstatin in the simulations of non-primate C3 complexes (Tamamis et al, 2010; Tamamis et al, 2011).

Guided by the above MM-GBSA study, subsequent simulations investigated the stabilities of compstatin complexes with "transgenic" variants of mouse C3, containing 6-9 substitutions from the human C3 sequence near the compstatin binding site. The MM-GBSA binding affinities of the resulting transgenic complexes were comparable to the one of the human C3 complex, and by -8 to -9 kcal/mol stronger relative to the mouse C3:compstatin complex (Tamamis et al., 2011). More recent simulations have investigated the affinities of a large number of human and mouse or rat C3 complexes with compstatin analogues, producing promising results (Tamamis et al., 2012). Thus, the combined study of a series of related protein-ligand complexes by atomistic simulations and an efficient evaluation of the corresponding affinities, such as in the MM-GB(PB)SA formulation, provides a powerful way to design new proteins or ligands.

#### **4.3 Example 3: Fast predictions of binding free energies using MM-GB(PB)SA**

Rastelli and co-workers (Rastelli et al., 2010) explored the reliability of using a single energy minimized receptor–ligand complex in MM-GB(PB)SA calculations to estimate ligand binding affinities for a series of structurally diverse inhibitors (Figure 5) of *Plasmodium falciparum* DHFR (Figure 6) with known binding modes and affinities.

Fig. 5. Sample of the structurally diverse *Pf*DHFR inhibitors studied in Ref. (Rastelli et al., 2010).

Fig. 6. Wild-type *Pf*DHFR in complex with NADPH (yellow) and the inhibitor WR92210 (green). This protein structure taken from PDB code 1J3I was used for the calculations in Ref. (Rastelli et al., 2010) and as described in Example 3.

They obtained excellent correlations between MM-PBSA or MM-GBSA binding affinities and experimental values, similar to those obtained after averaging over multiple snapshots from periodic boundary MD simulations in explicit water in the traditional sense, but with significant savings on computational time and effort. Different methods were used for generating the structures for the MM-GB(PB)SA calculations from minimizations in implicit and explicit solvent models, to minimization using a distance dependent dielectric function, and finally minimization followed by a short MD simulation and then re-minimization. The approach has been implemented in an automated workflow called BEAR (Binding Estimation After Refinement) which produces both MM-GBSA and MM-PBSA predictions of binding free energies, and is fast enough to be suitable for virtual screening applications (Degliesposti et al., 2011; Rastelli et al., 2009).

### **5. Conclusion**

MM-GBSA and MM-PBSA are computationally efficient, end-point free energy methods that have been widely used to study protein-ligand binding affinities. Even though they lack the sound theoretical foundations of recently developed computationally demanding absolute-affinity free-energy methods (Boresch et al., 2003; Deng & Roux, 2009; Gilson & Zhou, 2007; Lee & Olsen, 2006), their connection with statistical thermodynamics has been established (Swanson et al., 2004). Due to the approximations inherent in MM-GB(PB)SA methods, they are more applicable for ranking ("scoring") of ligand binding affinities rather than to quantitatively predicted absolute binding free energies. They should be regarded as approximate, as they combine a molecular mechanics energy function with a continuumelectrostatics treatment of solvation effects; they include solute conformational entropy effects in an approximate manner (Singh & Warshel, 2010); and ignore the solvent molecular structure. Accurate incorporation of solute entropy (Foloppe & Hubbard, 2006) and solvent effects in binding affinity calculations is challenging, but future extensions and development of MM-GB(PB)SA methods will undoubtedly serve to address these limitations.

#### **6. Acknowledgment**

JMH acknowledges funding from the SP4-Capacities Coordination and Support Action, Support Actions, EUROSTRUCT project (CSA-SA\_FP7-REGPOT-2008-1 Grant Agreement N° 230146).

#### **7. References**


clusters, *Proceedings of the ACM/IEEE Conference on Supercomputing (SC06)*, ISBN 0- 7695-2700-0, Tampa, Florida, USA, Nov 11-17, 2006


MM-GB(PB)SA Calculations of Protein-Ligand Binding Free Energies 187



Morikis D, Lambris J. D.Editors, pp. 317-340, CRC Press/Tayor& Francis Group, ISBN 0824725409, Boca Raton, FL



**Molecular Dynamics - Studies of Synthetic and Biological Macromolecules** Edited by Prof. Lichang Wang

ISBN 978-953-51-0444-5 Hard cover, 432 pages **Publisher** InTech **Published online** 11, April, 2012 **Published in print edition** April, 2012

Molecular Dynamics is a two-volume compendium of the ever-growing applications of molecular dynamics simulations to solve a wider range of scientific and engineering challenges. The contents illustrate the rapid progress on molecular dynamics simulations in many fields of science and technology, such as nanotechnology, energy research, and biology, due to the advances of new dynamics theories and the extraordinary power of today's computers. This second book begins with an introduction of molecular dynamics simulations to macromolecules and then illustrates the computer experiments using molecular dynamics simulations in the studies of synthetic and biological macromolecules, plasmas, and nanomachines. Coverage of this book includes: Complex formation and dynamics of polymers Dynamics of lipid bilayers, peptides, DNA, RNA, and proteins Complex liquids and plasmas Dynamics of molecules on surfaces Nanofluidics and nanomachines

### **How to reference**

In order to correctly reference this scholarly work, feel free to copy and paste the following:

Joseph M. Hayes and Georgios Archontis (2012). MM-GB(PB)SA Calculations of Protein-Ligand Binding Free Energies, Molecular Dynamics - Studies of Synthetic and Biological Macromolecules, Prof. Lichang Wang (Ed.), ISBN: 978-953-51-0444-5, InTech, Available from: http://www.intechopen.com/books/moleculardynamics-studies-of-synthetic-and-biological-macromolecules/mm-gb-pb-sa-calculations-of-protein-ligandbinding-free-energies

## **InTech Europe**

University Campus STeP Ri Slavka Krautzeka 83/A 51000 Rijeka, Croatia Phone: +385 (51) 770 447 Fax: +385 (51) 686 166 www.intechopen.com

#### **InTech China**

Unit 405, Office Block, Hotel Equatorial Shanghai No.65, Yan An Road (West), Shanghai, 200040, China Phone: +86-21-62489820 Fax: +86-21-62489821

© 2012 The Author(s). Licensee IntechOpen. This is an open access article distributed under the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.